RATE OF CONVERGENCE AND EDGEWORTH-TYPE 
EXPANSION IN THE ENTROPIC CENTRAL 
LIMIT THEOREM 



S. G. BOBKOV 1 ' 4 , G. P. CHISTYAKOV 2 ' 4 , AND F. GOTZE 3 ' 4 

Abstract. An Edgeworth-type expansion is established for the entropy distance to the class 
of normal distributions of sums of i.i.d. random variables or vectors, satisfying minimal 
moment conditions. 



1. Introduction 

Let (X n ) n >i be independent identically distributed random variables with mean EX% 
and variance Var(Xi) = 1. According to the central limit theorem, the normalized sums 

x l + ... + x n 



n 



are weakly convergent in distribution to the standard normal law: Z n Z, where Z ~ N(0, 1) 
with density <p(x) = -^== e~ x / 2 . A much stronger statement (when applicable) - the entropic 
central limit theorem - indicates that, if for some no, or equivalently, for all n > tiq, Z n have 
absolutely continuous distributions with finite entropies h(Z n ), then there is convergence of 
the entropies, 

h(Z n ) — >■ h(Z), as n — Y oo. (1-1) 

This theorem is due to Barron [Ba]. Some weaker variants of the theorem in case of regu- 
larized distributions were known before; they go back to the work of Linnik [L] , initiating an 
information-theoretic approach to the central limit theorem. 

To clarify in which sense (1.1) is strong, first let us recall that, if a random variable X with 
finite second moment has a density p(x), its entropy 

/+oo 
p(x) logp(x) dx 
-oo 

is well-defined and is bounded from above by the entropy of the normal random variable Z, 
having the same mean a and the variance a 2 as X (the value h(X) = — oo is possible). The 
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relative entropy 

D(X) = D(X\\Z) = h(Z)-h(X)= / p( x )\ogJ°±J-dx, 

where ip a ^ a stands for the density of Z, is non- negative and serves as kind of a distance to the 
class of normal laws, or to Gaussianity. This quantity does not depend on the mean or the 
variance of X, and may be related to the total variation distance between the distributions 
of X and Z by virtue of the Pinsker-type inequality D(X) > \ \\Fx — Fz\\tv- This already 
shows that the entropic convergence (1.1) is stronger than in the total variation norm. 

Thus, the entropic central limit theorem may be reformulated as D{Z n ) — > 0, as long as 
D(Z m ) < +oo for some tiq. This property itself gives rise to a number of intriguing questions, 
such as to the type and the rate of convergence. In particular, it has been proved only recently 
that the sequence h(Z n ) is non-decreasing, so that D(Z n ) \. 0, cf. [A-B-B-Nl], [B-M]. This 
leads to the question as to the precise rate of D(Z n ) tending to zero; however, not much seems 
to be known about this problem. The best results in this direction are due to Artstein, Ball, 
Barthe and Naor [A-B-B-N2], and to Barron and Johnson [B-J]. In the i.i.d. case as above, 
they have obtained an expected asymptotic bound D(Z n ) = 0(1 /n) under the hypothesis that 
the distribution of X\ admits an analytic inequality of Poincare-type (in [B-J], a restricted 
Poincare inequality is used). These inequalities involve a large variety of "nice" probability 
distributions which necessarily have a finite exponential moment. 

The aim of this paper is to study the rate of D(Z n ), using moment conditions E | Xi\ s < +oo 
with fixed values s > 2, which are comparable to those required for classical Edgeworth-type 
approximations in the Kolmogorov distance. The cumulants 

7r = i- r ^rlogEe < " fl | t=0 
are then well-defined for all r < [s] (the integer part of s), and one may introduce the functions 

»W = (pM E W*) (I)" ■ ■ ■ (^%)" > d-2) 

involving the Chebyshev-Hermite polynomials H^. The summation in (1.2) runs over all non- 
negative solutions (n, . . . , rfc) to the equation n + 2r2 + • • • + kr^ = k with j = r\ + ■ ■ ■ + r^. 

The functions are defined for k = 1, . . . , [s] — 2. They appear in the Edgeworth-type 
expansions including the local limit theorem, where qk are used to construct the approximation 
of the densities of Z n . These results can be applied to obtain an expansion in powers of 1/n for 
the distance D(Z n ). For a multidimensional version of the following Theorem 1.1 for moments 
of integer order s > 2, see Theorem 6.1 below. 

Theorem 1.1. Let E |Ai| s < +oo (s > 2), and assume D(Z no ) < +oo ; for some no- Then 
= C i + 5 + • • • + ZiB^M + »((«log„)-<-^). (1.3) 



Here 



n n- nK s - 2 )/ 2 )] 

2 J I i\fc r+oo 



E^E/_J*.W... fc W^r, (1.4) 



k=2 

where the summation runs over all positive integers (n, . . . , r^) such that r\ + ■ ■ ■ + = 2j. 
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The implied error term in (1.3) holds uniformly in the class of all distributions of X\ 
with prescribed tail behaviour t — > B \X\\ S l{|Xi|>t} an d prescribed values no and the entropy 
distance D(Z no ). 

As for the coefficients, each Cj represents a certain polynomial in the cumulants 73, . . . , 72,7+1 • 
For example, c\ = ^ 73, and in the case s = 4 (1.3) gives 

D(Z n ) = -±- (EX?) 2 + o (^—] {KXf < +00). (1.5) 

Thus, under the 4-th moment condition, as conjectured by Johnson ([J], p. 49), we have 
D(Z n ) < where the constant depends on the underlying distribution. Actually, this con- 
stant may be expressed in terms of BXf and D(X\), only. 

When s varies in the range 4 < s < 6, the leading linear term in (1.5) will be unchanged, 
while the remainder term improves and satisfies 0(— 7) in case BXf < +00. But for s = 6, the 
result involves the subsequent coefficient C2 which depends on 73,74, and 75. In particular, if 
73 = 0, we have c 2 = ^ 7J, thus 

D(Z n ) = -L (EXf - 3) 2 + o ( 1 ) ( BXf = 0, BXf < +00). 

48 n 2 v y ^(nlogn)" 1 / y 

More generally, the representation (1.3) simplifies if the first k — 1 moments of X\ coincide 
with the corresponding moments of Z ~ N(0, 1). 

Corollary 1.2. Let E|Xi| s < +00 (s > 4), and assume that D(Z no ) < +00, /or some tiq. 
Given k = 3, 4, . . . , [s], assume that 7j = /or a// 3 < j < k. Then 



Johnson has noticed (though in terms of the standardized Fisher information, see [J], 
Lemma 2.12) that if 7^ 7^ 0, D(Z n ) cannot be better than n~( k ~ 2 \ 

Note that when BXf k < +00, the o-term may be removed from the representation (1.6). 
On the other hand, when k > the o-term will dominate the n~( fc-2 )-term, and we can 
only say that D(Z n ) = o((n log nT^ 2 )/ 2 ) . 

As for the missing range 2 < s < 4, there are no coefficients Cj in the sum (1.3), and 
Theorem 1.1 just tells us that 

^'"( (nlogn)..-^ )- <L7) 
This bound is worse than the rate 1/n. In particular, it only gives D(Z n ) = o(l) for s = 2, 
which is the statement of Barron's theorem. In fact, in this case the entropic distance to 
normality may decay to zero at an arbitrarily slow rate. In case of a finite 3-rd absolute 
moment, D(Z n ) = °{ ^ n \ ogn )- To see that this and that the more general relation (1.7) cannot 
be improved with respect to the powers of 1/n, we prove: 

Theorem 1.3. Let n > 1. Given 2 < s < 4, there exists a sequence of independent 
identically distributed random variables (X n ) n >i with B \Xi\ s < +00, such that D(X\) < +00 
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and 



D{Zn) - (nlogn)(- C 5)/2(log„),' n ~ ni(Xl) ' 



with a constant c{jfj > 0, depending on rj, only. 



Known bounds on the entropy and Fisher information are commonly based on Bruijn's 
identity which may be used to represent the entropic distance to normality as an integral of 
the Fisher information for regularized distributions (cf. [Ba]). However, it is not clear how to 
reach exact asymptotics with this approach. The proofs of Theorems 1.1 and 1.3 stated above 
rely upon classical tools and results in the theory of summation of independent summands 
including Edgeworth-type expansions for convolution of densities formulated as local limit 
theorems with non-uniform remainder bounds. For non-integer values of s, the authors had to 
complete the otherwise extensive literature by recent technically rather involved results based 
on fractional differential calculus, see [B-C-G]. Our approach applies to random variables 
in higher dimension as well and to non-identical distributions for summands with uniformly 
bounded s-th moments. 

We start with the description of a truncation-of-density argument, which allows us to 
reduce many questions about bounding the entropic distance to the case of bounded densities 
(Section 2). In Section 3 we discuss known results about Edgeworth-type expansions that will 
be used in the proof of Theorem 1.1. Main steps of the proofs are based on it in Sections 4-5. 
All auxiliary results also cover the scheme of i.i.d. random vectors in H d (however, with integer 
values of s) and are finalized in Section 6 to obtain multidimensional variants of Theorem 1.1 
and Corollary 1.2. Sections 7-8 are devoted to lower bounds on the entropic distance to 
normality for a special class of probability distributions on the real line, that are applied in 
the proof Theorem 1.3. 



2. Binomial decomposition of convolutions 



First let us comment on the assumptions in Theorem 1.1. It may occur that X\ has a 

singular distribution, but the distribution of X\ + X2 and of all next sums S n = X\ H h X n 

[n > 2) are absolutely continuous (cf. [T]). 

If it exists, the density p of X\ may or may not be bounded. In the first case, all the 
entropies h(S n ) are finite. If p is unbounded, it may happen that all h(S n ) are infinite, even 
if p is compactly supported. But it may also happen that h(S n ) is finite for some n = hq and 
then entropies are finite for all n > uq (see [Ba] for specific examples). 

Denote by p n {x) the density of Z n = S n /y/n (when it exists). Since it is desirable to work 
with bounded densities, we will slightly modify p n at the expense of a small change in the 
entropy. Variants of the next construction are well-known; see e.g. [S-M], [I-L], where the 
central limit theorem was studied with respect to the total variation distance. Without any 
extra efforts, we may assume that X n take values in H d which we equip with the usual inner 
product (■, ■) and the Euclidean norm | • |. For simplicity, we describe the construction in the 
situation, where X\ has a density p(x) (cf. Remark 2.5 on the appropriate modifications in 
the general case). 

Let mo > be a fixed integer. (For the purposes of Theorem 1.1, one may take mo = [«] + !•) 
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If p is bounded, we put p n (x) = p n {x) for all n > 1. Otherwise, the integral 

b= p{x)dx (2.1) 

Jp(x)>M 

is positive for all M > 0. Choose M to be sufficiently large to satisfy, e.g., < b < \ (cf. 
Remark 2.4). In this case (when p is unbounded), consider the decomposition 

p(x) = (l-b) Pl (x) + b P2 (x), (2.2) 

where pi, p 2 are the normalized restrictions of p to the sets {p(x) < M} and {p(x) > M}, 
respectively. Hence, for the convolutions we have a binomial decomposition 

n 

p *n = Y^C*(1- b) k b n ~ k pf * p* 2 (n - k) . 
k=0 

For n > mo + 1, we split the above sum into the two parts, so that p* n = p n \ + p n 2 with 

n too 

Pnl= 2^ C n(l-0) Pi * P 2 , Pn2 = 2_^ C n\ l - b ) b Pi * P2 

fc=mo+l fc=0 

Note that, whenever b < b\ < \, 

/mo 
Pn2 (x) dx = Y^ Cl (1 - b) k b n ~ k < n m ° b n ~ m " = o(6?), as n -> oo. (2.3) 

Finally define 

Pn(ar) =Pni(ar) = — — n^pni^v 7 ^), (2-4) 

and similarly p n 2(x) = j- n d / 2 p n 2(x\/n). Thus, we have the desired decomposition 

p n (x) = (1 - e n )p n i(x) + e„Pn2(aj)- (2.5) 

The probability densities p n \{x) are bounded and provide a strong approximation for 
Pn(x) = n d l 2 p* n (xyjn). In particular, from (2.3)-(2.5) it follows that 



/ 



\Pnl(x) -p n {x)\dx < 2 n , 



for all n large enough. One of the immediate consequences of this estimate is the bound 

\v nl (t) - v n (t)\ < 2~ n (*GR d ) (2.6) 

for the characteristic functions v n (t) = J e l( - t ' x ^p n (x) dx and v n \{t) = J e 1 ^''^ p n \(x) dx, corre- 
sponding to the densities p n and p n \. 

This property may be sharpened in case of finite moments. 



Lemma 2.1. 7/E \X\\ S < +oo (s > 0), then for all n large enough, 

J(l + \x\ s )\p n (x)- Pn (x)\dx <2~ n . 
In particular, (2.6) also holds for all partial derivatives of v n \ and v n up to order m = [s]. 
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Proof. By the definition (2.5), \p n \{x) - p n (x)\ < e n (p nl (x) + p n 2(x)), so 

J \ x \ s \Pm(x) - Pn(x)\ dx < - e ™ - n~ s/2 J \x\ s p nl (x) dx + rT s/2 j \x\ s p n2 (x) dx . 

Let Ui, U 2 , ... be independent copies of U and V±, V 2 , ... be independent copies of V (that 
are also independent of U n 's), where U and V are random vectors with densities p\ and p 2 , 
respectively. From (2.2) 

p a = E|Xi| s = (1 - b)B\U\ s + bE\V\ s , 
so E \U\ S < I3 s /b and E \V\ S < (3 s /b (using b < \). Therefore, for the normalized sums 

R k>n = (Z7i + ■ ■ ■ + U k + Vi + ■ ■ ■ + K-fc), < k < n, 

we have E \R Kn \ s < ^ n s / 2 , if s > 1, and E \R k)H \ s < ^ n l ~^ s / 2 \ if < s < 1. Hence, by the 
definition of p n \ and p n2 , 



I 
I 



X\ S Pnl (x) dx = n S ' 2 C n( l - ^ ^ E \ R k,n\ S < - » 



fc=mo+l 
mo 



b 



x\ s p n2 (x)dx = n s ' 2 Y, C k n {l-b)H n - k V\R Kn \ s <^Ln s+l e n . 



b 



k=0 

It remains to apply the estimate (2.3) on e n , and Lemma 2.1 follows. 

We need to extend the assertion of Lemma 2.1 to the relative entropies, which we consider 
with respect to the standard normal distribution on R rf with density f(x) = (2it)~ d / 2 e - ^ I 2 . 
Thus put 

D n = I p n (x) log dx, D n = ! p n (x) log 



dx. 



Lemma 2.2. If X\ has a finite second moment and D{X\) < +00, then \D n — D n \ < 2 n , 
for all n large enough. 

First, we collect a few elementary properties of the convex function L(u) = nlogn (u > 0). 
Lemma 2.3. For all u, v > and < e < 1, 

a) L((l -e)u + ev) < (1 - e) L{u) + eL(v) ; 

b) L((l - e) u + ev) > (1 - e) L(u) + eL(v) + uL(l - e) + vL(e) ; 

c) L((l-e)u + eu)>(l-e)L(tt)-i«-i. 

The first assertion is just Jensen's inequality applied to L. For the second one, for fixed 
v > and < e < 1, consider the concave function 

f(u) = [(1 - e) L{u) + eL(v) + uL(\ - e) + u£(e)] - £((1 - e) it + ew), u > 0. 
Clearly, /(0) = and /'(«) = (1 - e) log < 0. Hence, f{u) < 0, thus proving b). 
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For the last inequality, we use L{x) >-\, for all x > 0. Let x = (1 - e) u + ev < 1. Then 
L((l — e) u) < 0, so L(x) > L((l — e) u) — \. This inequality is also true for x > 1, since L 
is increasing and positive in this region. Finally, L((l — e) u) = (1 — e) L(«) + uL(l — e), and 
L(l -e)>-I 

Proof of Lemma 2.2. Assuming that p is (essentially) unbounded, define 

D nj = J Pnj (x)log^^- dx (i = 1,2), 
so that D n = D n< i. By Lemma 2.3 a), D n < (1 — e n )D n \ + e n D n 2- On the other hand, by 6), 

Ai > ((1 - £n)Ail +£n-Dn2) + £„ log £ n + (1 - £ n ) log(l - £„). 

In view of (2.3), the two estimates give 

|i^i - < C(n + A»i + A*) &i , ( 2 - 7 ) 

which holds for all n > 1 with some constant C. In addition, by the inequality in c) with 
e = b, from (2.2) it follows that 

D(X 1 \\Z) = J ^(^|y) <p(x)dx>(l-b) J Pl {x)\og^dx-- e , 

where Z denotes a standard normal random vector in R rf . By the same reason, 

DWWZbf p 2 (x)\og^-dx--. 

J fix) e 

But, if p is a density of a random vector U with finite second moment, the relative entropy 

D(U\\Za,v) = [ p{x)\og-^-dx (Z fl)E ~ N(a,Z)) 

is minimized for the mean a = EC/ and covariance matrix S = Var(C/), and is then equal to 
D(U). Hence, from the above lower bounds on D(X\) we obtain that 

DiX^Z) > (l-b)D(U)- -, D(Xi||Z)>6D(^)--, (2.8) 

e e 

where £/ and V have densities pi and p 2 , as in the proof of Lemma 2.1. 
Now, by (2.2), 

/3 2 = EIA^I 2 = (1-&)E|C/| 2 + 6E|V| 2 

= (l-fe)(|a 1 | 2 + Tr(S 1 ))+6(|a 2 | 2 + Tr(S 2 )) (/3 > 0), 

where a\ = E U, a 2 = E V, and Si = Var(C/), £ 2 = Var(V). In particular, |ai| 2 < (3 2 /b and 
1 02! 2 < /3 2 /&, and similarly for the traces of the covariance matrices. Note that both U and V 
have non-degenerate distributions, so the determinants <r 2 = det(Ej) are positive. 



8 S. G. Bobkov, G. P. Chistyakov and F. Gotze 

Let Ui, U2, ■ ■ ■ be independent copies of U and V±, V2, . . . be independent copies of V (that 
are also independent of f7 n 's). Again, by the convexity of the function u log u, 

D nl < £ C k (l-b) k b n ~ k fr k , n (x)\og r -^-dx, (2.9) 

£n fc=m +l J 

1 m ° r ( \ 

D n2 < C " (! " &)* b n ~ k / r k>n (x) log dx, (2.10) 
£n k=o ^ x > 
where r k ^ n are densities of the normalized sums Rk, n from the proof of Lemma 2.1. 

On the other hand, if R is a random vector in R d with density r(x) , such that det (Var (i?)) = 
a 2 (a > 0), 

D(R\\Z) = J r(x) log dx = D(R) + log -L + E ffi - - . (2.11) 

In the case i? = i4,n, we have BR = a\-j^ + 02^, so |Ei?| 2 < ^-n. Also, Var(i?) = 

I Var(£7) + Var(V), implying Tr(Var(i?)) < The two bounds give E \R\ 2 <^-(n + 1). 
In addition, by the Minkowski inequality for the determinants of positive definite matrices, 
det(Var(i?)) > erg = min(crf,cj|). Hence, by (2.11), 

1 B 2 (n + V) 

D{R Kn \\Z) < D{R Kn ) + log — + P \ (2.12) 
As for D(Rk, n ), they can be estimated by virtue of the general entropy power inequality 

e 2h(X+Y)/d > e 2h(X)/d + e 2h(X)/d 

which holds true for arbitrary independent random vectors in R rf with finite second moments 
and absolutely continuous distributions (cf. [Bl], [C-D-T]). It easily implies another general 
bound D(X + Y) < m&x{D(X), D(Y)}. So taking into account (2.8) and using b < |, we get 

D(R k , n ) < max{D(U),D(V)} < ± (l>(Xi||Z) + H 

Together with (2.12) we arrive at Z)(i?fc in ||Z) < C(l + -D(Xi||Z))n with some constant C, and 
from (2.9)-(2.10) 

D nl < C{l + D{X-i\\Z))n, D n2 < C(l + D(Xi||Z))n. 
Since D(Xi||Z) is finite, it remains to apply (2.7). Lemma 2.2 is proved. 

Remark 2.4. If X\ has a finite second moment and D(X\) < +00, the parameter M 
from (2.1) can be chosen explicitly in terms of b by involving the entropic distance D(X\) and 
cr 2 = det(S), where S is the covariance matrix of Xl. 

Indeed, putting a = EAi, we have a simple upper estimate 

/ p log ( 1 + — — ] dx = / — — log ( 1 + ) (f a s dx 

< f-E-iog-2-dx + 1 = -D(Xi) + l. 

7 Va,E V?a,E 
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On the other hand, the original expression majorizes 



I, 



M d , 2^ 



so 



p(x) log t— r dx = 61ogM + - log(27recr ), 

{p(x)>M} <Aj,e(#) 2 
M < - JD(X 1 )+l)/b 

Remark 2.5. If Z n have absolutely continuous distributions with finite entropies for 
n > no > 1, the above construction should properly be modified. 

Namely, one may put p n = p n , if p n are bounded, and otherwise apply the same decompo- 
sition (2.2) to p no in place of p. As a result, for any n = Atiq + B {A > 1, < B < no — 1), 
the partial sum S n will have the density 

r n (x) = J2C k A (l-b) k b A - k [ (p* 1 k * P ; iA ~ k) )( X -y)dF B (y), 
k=o J 

where Fb is the distribution of Sb- For A > tuq + 1, split the above sum into the two parts 
with summation over mo + 1 < k < A and < k < mo, respectively, so that r n = p n \ + p n 2- 
Then, like in (2.4) and for the same sequence e n described in (2.3), define 



Pn(x) = — — n d/2 Pnl{xy/n) . 



Clearly, these densities are bounded and strongly approximate p n (x). In particular, for all n 
large enough, they satisfy the estimates that are simialar to the estimates in Lemmas 2.1-2.2. 



3. Edgeworth-type expansions 

Let (X n ) n >i be independent identically distributed random variables with mean EXi = 
and variance Var(Xi) = 1. In this section we collect some auxiliary results about Edgeworth- 
type expansions both for the distribution functions F n (x) = P{Z n < x} and the densities 
p n (x) of the normalized sums Z n = S n / '\/n, where S n = X\ + ■ ■ ■ + X n . 

If the absolute moment E \ X\\ S is finite for a given s > 2 and m = [s], define 

m-2 

V m {x) = ip(x) + q k (x) rT k l 2 (3.1) 
k=i 

with the functions qu described in (1.2). Put also 

/x ™-2 
<p m (y)dy = $(*)+ Y,Qk{x)n- k ' 2 . (3.2) 
fc=i 

Similarly to (1.2), the functions Qk have an explicit description involving the cumulants 
73, • • • , 7fc+2 of Xi. Namely, 

o,w = -.(x) v Hm ,_ i(x) (I)" . . . (^y, 

where the summation is carried out over all non-negative integer solutions (r±, . . . ,r k ) to the 
equation n + 2r2 + • • • + kr^ = k with j = n + ■ ■ ■ + (cf. e.g. [B-RR] or [Pe2] for details). 



10 



S. G. Bobkov, G. P. Chistyakov and F. Gotze 



Theorem 3.1. Assume that limsup| i |_ > ._(. 00 \Ee itXl \ < 1. //E|Xi| s < +oo (s > 2), then 
as n — > oo, uniformly over all x 

(1 + \x\ s )(F n (x) - $ m (x)) = o{n-^' 2 ). (3.3) 

The implied error term in (3.3) holds uniformly in classes of distributions with prescribed 
rates of decay of the functions t —> E|Xi| s l/|Xi|>t} an d T — > svcp t>T lEe 1 *^" 1 !- 

For 2 < s < 3 and m = 2, there are no terms in the sum (3.2), and then &2{x) = <&{x) is 
the distribution function of the standard normal law. In this case, (3.3) becomes 

(1 + |x| s ) (F n (x) - = o(n-( s ~ 2 )/ 2 ) . (3.4) 

In fact, in this case Cramer's condition on the characteristic function is not used. The result 
was obtained by Osipov and Petrov [O-P] (cf. also [Bi] where (3.4) is established with O). 

In the case s > 3 Theorem 2.1 can be found in [Pe2] (Theorem 2, Ch.VI, p. 168). Note that 
when s = m is integer, the relation (3.3) without the factor 1 + \x\ m represents the classical 
Edgeworth expansion. It is essentially due to Cramer and is described in many papers and 
textbooks (cf. [E], [F]). However, the case of fractional values of s is more delicate, especially 
in the next local limit theorem. 

Theorem 3.2. Let E \ Xi\ s < +oo (s > 2). Suppose S no has a bounded density for some uq. 
Then for all n large enough, S n have continuous bounded densities p n satisfying, as n — >■ oo, 

(1 + \x\ m ) ( Pn (x) - <p m (x)) = o(n-(*- 2 )/ 2 ) (3.5) 

uniformly over all x. Moreover, 

(1 + \x\ s )( Pn (x) - <p m (x)) = o(n-( s - 2 )/ 2 ) + (1 + \x\ s - m ) (O^-^- 1 )/ 2 ) + o(n-^)). (3.6) 

In case 2 < s < 3, the last remainder term on the right-hand side is dominating, and (3.6) 
becomes 

(1 + \x\") ( P n(x) - <p{x)) = o(n~^/ 2 ) + (1 + \x\ s - 2 ) o(n-( s - 2 )) . 
If s = m is integer and m > 3, Theorem 3.2 is well-known; (3.5)-(3.6) then simplify to 

(1 + \x\ m )( Pn (x) - <p m {x)) = o(n-^- 2 )/ 2 ). (3.7) 

In this formulation the result is due to Petrov [Pel] (cf. [Pe2], p. 211, or [B-RR], p. 192). 
Without the term 1 + |x| m , the relation (3.7) goes back to the results of Cramer and Gnedenko 
(cf. [G-K]). 

In the general (fractional) case, Theorem 3.2 has recently been obtained in [B-C-G] by 
involving the technique of Liouville fractional integrals and derivatives. The assertion (3.6) 
gives an improvement over (3.5) on relatively large intervals of the real axis, and this is essential 
in the case of non-integer s. 

An obvious weak point in Theorem 3.2 is that it requires the boundedness of the densities 
p n , which is, however, necessary for the conclusions, such as (3.5) or (3.7). Nevertheless, 
this condition may be removed, if we require that (3.5)-(3.6) hold true for slightly modified 
densities, rather than for p n . 



Entropic central limit theorem 



11 



Theorem 3.3. Let ~E\X\\ S < +00 (s > 2). Suppose that, for all n large enough, S n 
have absolutely continuous distributions with densities p n . Then, for some bounded continuous 
densities p n , 

a) the relations (3.5)-(3.6) hold true forp n instead of p n ; 

b) J^°(l + I a; I s ) \p n {x) — Pn(x)\ dx < 2~ n , for all n large enough; 

c) p n {x) = p n (x) almost everywhere, if p n is bounded (a.e.) 

Here, the property c) is added to include Theorem 3.2 in Theorem 3.3 as a particular case. 
Moreover, one can use the densities p n constructed in the previous section with m,Q = [s] + 1. 
We refer to [B-C-G] for detailed proofs. 

This more general assertion allows us to immediately recover, for example, the central limit 
theorem with respect to the total variation distance (without the assumption of boundedness 
of p n ). Namely, we have 

/+00 
\p n (x) - <p m {x)\ dx = o{n~^/ 2 ). (3.8) 
-00 

For s = 2 and (f2 (x) = <p(x), this statement corresponds to a theorem of Prokhorov [Pr], while 
for s = 3 and <fi3(x) = ip(x)(l + 73 x 6 ~^ ) ~ to the result of Sirazhdinov and Mamatov [S-M]. 

Multidimensional case 

Similar results are also available in the multidimensional case for integer values s = m. 
In the remaining part of this section, let {X n ) n >\ denote independent identically distributed 
random vectors in the Euclidean space R rf with mean zero and identity covariance matrix. 

Assuming E|Xi| m < +00 for some integer m > 2 (where now | • | denotes the Euclidean 
norm), introduce the cumulants 7^ of X\ and the associated cumulant polynomials 7fc(ii) up 
to order m by using the equality 

h & lo s Eem<t ' Xl> U = I = £>^r (* = 1 ^ t6Rd )- 

\v\=k 

Here the summation runs over all d-tuples v = (vi,. . . ,u^) with integer components Uj > 
such that \v\ = v\ + ■ ■ ■ + Vd = k. We also write v\ = v\\ . . . and use a standard notation 
for the generalized powers z u = z^ 1 . . . z u d d of real or complex vectors z = (z\,...,Zd), which 
are treated as polynomials in z of degree \u\. 
For 1 < k < m — 2, define the polynomials 

Pk{lt) = ^ T^r^X—) ■■\jkT2j\ ' (3 - 9) 

ri +2r 2 +-+kr k =k v 7 \ V 1 / / 

where the summation is performed over all non-negative integer solutions (ri, . . . , r^) to the 
equation 77 + 2r 2 + • • • + kr^ = k. 

Furthermore, like in dimension one, define the approximating functions tp m {x) on R rf by 
virtue of the same equality (3.1), where every is determined by its Fourier transform 



/ 



e i{t > x) q k {x) dx = P k (it) e- |t|2/2 . (3.10) 
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If S no has a bounded density for some no, then for all n large enough, S n have continuous 
bounded densities p n satisfying (3.7); see [B-RR], Theorem 19.2. We need an extension of this 
theorem to the case of unbounded densities, as well as integral variants such as (3.8). The first 
assertion (3.11) in the next theorem is similar to the one-dimensional Theorem 3.3 in the case 
where s = m is integer, cf. (3.5). For the proof (which we omit), one may apply Lemma 2.1 
and follow the standard arguments from [B-RR], Chapter 4. 

Theorem 3.4. Suppose that E |Xi| m < +oo with some integer m > 2. //, for all n large 
enough, S n have densities p n , then the densities p n introduced in section 2 with tuq = m + 1 
satisfy 

(1 + \x\ m ) {p n (x) - <p m {x)) = o(n- {m - 2 V 2 ) (3.11) 
uniformly over all x. In addition, 

J(l + \x\ m ) \p n (x) - <p m (x)\ dx = (n-(— 2 )/ 2 ). (3.12) 

The second assertion is Theorem 19.5 in [B-RR], where it is stated for m > 3 under a 
slightly weaker hypothesis that X\ has a non-zero absolutely continuous component. Note 
that, by Lemma 2.1, it does not matter whether p n or p n are present in (3.12). 

4. Entropic distance to normality and moderate deviations 

Let X\ , X2 , . . . be independent identically distributed random vectors in R rf with mean 
zero, identity covariance matrix, and such that D(Z n ) < +00, for all n large enough. 

According to Lemma 2.2 and Remark 2.5, up to an error at most 2~ n with large n, the 
entropic distance to normality, D n = D(Z n ), is equal to the relative entropy 

D„= f p n {x)log^y^-dx, 

where if is the density of a standard normal random vector Z in Ti d . 
Given T > 1, split the integral into the two parts by writing 

D n = [ p n {x)\og^j^-dx+ I p n (x)log^j^-dx. (4.1) 

J\x\<T <P{X) J\ X \>T f{X) 

By Theorems 3.3-3.4, p n are uniformly bounded, i.e., p n (x) < M, for all x G R d and n > 1 
with some constant M. Hence, the second integral in (4.1) may be treated from the point of 
view of moderate deviations (when T is not too large). Indeed, on one hand, 



L 



p n (x)log - dx < / p n (x)\og——-dx<C I \x\ 2 p n (x)dx, 
f{x) J\ X \ >T <p(x) J\ X \ >T 



\x\>T fVX) j\ x \ >T yyx) j\ x \ >T 

where C = \ + log M + ^ log(27r). One the other hand, using ulog u > u — 1, we have a lower 
bound 

/ p n ( x )\og^ldx> [ {p n (x)-f{x))dx>-V{\Z\>T}. 

J\x\>T fix) J\x\>T 
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The two estimates give 

/ p n (x)\og^ldx < P{\Z\ >T} + C f \x\ 2 p n {x)dx. (4.2) 

J\x\>T f{ x ) J\x\>T 

This is a very general upper bound, valid for any probability density p n on R d , bounded by a 
constant M (with C as above). 

Following (4.1), we are faced with two analytic problems. The first one one is to give 
a sharp estimate of \p n (x) — f(x)\ on a relatively large Euclidean ball \x\ < T. Clearly, T 
has to be small enough, so that results like local limit theorems, such as Theorems 3.2-3.4 
may be applied. The second problem is to give a sharp upper bound of the last integral in 
(4.2). To this aim, we need moderate deviations inequalities, so that Theorems 3.1 and 3.4 
are applicable. Anyway, in order to use both types of results we are forced to choose T from 
a very narrow window only. This value turns out to be approximately 

T n = yf{s - 2) log n + s log log n + p n (s > 2), (4.3) 

where p n — > +oo is a sufficiently slowly growing sequence (whose growth will be restricted by 
the decay of the n-dependent constants in o-expressions of Theorems 3.2-3.4). In case s = 2, 
one may put T n = J~fh^ meaning that T n — > +oo is a sufficiently slowly growing sequence. 



Lemma 4.1. (The case d = 1 and s real) // EX X = 0, EXf = 1, E |A"i| s < +oo (s > 2), 
then 

[ x 2 p n (x)dx = o((nlogn)-( s - 2 )/ 2 ). (4.4) 



Lemma 4.2. (The case d > 2 and s integer) If Xi has mean zero and identity covariance 
matrix, and E |Xi| m < +oo, then 

f x 2 p n (x)dx = o(n-( m - 2 )/ 2 (logn)( m - d )/ 2 ) (m > 3), (4.5) 

J\x\>T n 

and J|;j.| >T x 2 p n (x) dx = o(l) in case m = 2. 

Note that plenty of results and techniques concerning moderate deviations have been de- 
veloped by now. Useful estimates can be found e.g. in [G-H]. Restricting ourselves to integer 
values of s = m, one may argue as follows. 

Proof of Lemma 4.2. Given T > 1, write 

/ \x\ 2 p n (x)dx < / \x\ m p n {x)dx 

J\x\>T 1 J 

< Tf^Z2 [ \x\ m \Pn(x)-^ m (x)\dx + -^ [ \x\ m <p m (x)dx. (4.6) 
1 J 1 J\x\>T 



By Theorem 3.4, cf. (3.12), the first integral in (4.6) is bounded by o(n~ ( - m ~ 2 )/ 2 ). 

From the definition of q% it follows that qk(x) = N(x)ip(x) with some polynomial N of 
degree at most 3(m — 2) (cf. section 6 for details). Hence, from (3.1), <p m (x) < 2ip(x) on the 
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balls of large radii \x\ < n s with sufficiently large n (where < 5 < \). On the other hand, 
with some constants Cj, C' d depending on the dimension, only, 

f \x\ m <p( X ) dx = C d / + °° r ™+^l e -^/2 dr < C ' d T m+d~2 e - T y2_ ^ 
J\x\>T JT 

But for T = T n and s > 2, we have e" T ^ 2 = £ o(rj-( m - 2 )/ 2 ), so by (4.6)-(4.7), 



L 



\x\>T n 



Since T n is of order y/log n, (4.5) follows. Also, in the case m = 2 (4.6) gives the desired 
relation 

I ^ ? „(* )<fa <»(!)+/ wM»)*-»o („-><»). 

J\x\>T n J\x\>T n 

Proof of Lemma 4.1. The above argument also works for d = 1, but it can be refined 
applying Theorem 3.1 for real s. The case s = 2 is already covered, so let s > 2. 
In view of the decomposition (2.5), integrating by parts, we have, for any T > 0, 



(l-e„) / x 2 p n (x)dx < I x 2 p n {x)dx = / x 2 (iF n (x) (4.8) 

J|x|>T J|x|>T ./|x|>T 

/•+oo 

= T 2 (l - F n (T) + F n (-T)) + 2 / x(l - F n (x) + F n (-x)) dx, (4.9) 

JT 



where F n denotes the distribution function of Z n . (Note that the first inequality in (4.8) should 
be just ignored in the first case, when p is bounded.) 
By (3.3), 

V { X ) I 

F n (x) = <5> m (x) + 2V2 , , , r n = sup |r n (x)| ->• (n ->• oo). 

Hence, the first term in (4.9) can be replaced with 

T 2 (l-$ m (T) + <S> m (-T)) (4.10) 



at the expense of an error not exceeding (for the values T ~ ^logn 

2r n T 2 



n (s-2)/2 I + T s 

Similarly, the integral in (4.9) can be replaced with 

»+oo 



= o((n log ny^- 2 ^ 2 ). (4.11) 



J x((l-^ m (x) + ^ m (-x))dx (4.12) 



at the expense of an error not exceeding 
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To explore the behavior of the expressions (4.10) and (4.12) for T = T n using precise 
asymptotics as in (4.3), recall that, by (3.2), 

m-2 

1 - $ m (x) = 1 - <&(*) - Qk(x) n- k /\ 

k=l 

Moreover, we note that Qk{x) = Nsk-i(x) ^(x), where N^-i is a polynomial of degree at 
most 3k — 1. Thus, these functions admit a bound \Qk{x)\ < C m (l + |x| 3m ) tp(x) with some 
constants C m (depending on m and the cumulants 73, ... , 7 m of X\), which implies with some 
other constants 

|1 - <Mz)| < (1 - Hx)) + ° ml tf 1 ' <p(x). (4.14) 

Hence, using 1 — $(x) < (x > 0), we get 

T n 2 |1 - $ ro (T n )| < CTl (1 - $(T n )) < CT n e~ T "/ 2 = o((n log n)-(- 2 )/ 2 ) . (4.15) 

A similar bound also holds for T 2 |<I ) m (— T ra )|. 

Now, we use (4.14) to estimate (4.12) with T = T n up to a constant by 

roo 

/ x (1 - $(x)) cte < 1 - $(T) = o((n log n)-( s " 2 )/ 2 ) . 

It remains to combine the last relation with (4.11), (4.13) and (4.15). Since e n — > in (4.4), 
Lemma 4.1 follows. 



Remark 4.3. Note that the probabilities P{|Z| > T} appearing in (4.2) make a smaller 
contribution for T = T n in comparison with the right-hand sides of (4.4)-(4.5). Indeed, we 
have P{|Z| > T] < C d T d ' 2 e - T ' ''/ 2 (T > 1). Hence, the relations (4.4)-(4.5) may be extended 
to the integrals 




5. Taylor-type expansion for the entropic distance 



In this section we provide the last auxiliary step towards the proof of Theorem 1.1. In order 
to describe the multidimensional case, let Xi,X 2 ,... be independent identically distributed 
random vectors in R d with mean zero, identity covariance matrix, and such that D(Z no ) < +00 
for some uq. 

If p no is bounded, then the densities p n of Z n (n > no) are uniformly bounded, and we 
put p n = p n - Otherwise, we use the modified densities p n according to the construction of 
Section 2. In particular, if Z n has density p n , then \D(Z n \\Z) — D(Z n )\ < 2~ n for all n 
large enough (where Z is a standard normal random vector, cf. Lemma 2.2 and Remark 2.5). 
Moreover, by Lemmas 4.1-4.2 and Remark 4.3, 



D(Z n )- f p n (x)\0g P -44d,- 
J\a 



'\x\<T n 



<p(x) 



o(A n ), 



(5.1) 
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where T n are defined in (4.3) and 

A n = n-( S ~ 2 )/ 2 (log n )-(^max(d,2))/2 ^ 

(with the convention that A n = 1 for the critical case s = 2). 

Thus, all information about the asymptotic of D(Z n ) is contained in the integral in (5.1). 
More precisely, writing a Taylor expansion for p n using the approximating functions (p m in The- 
orems 3.2-3.4 leads to the following representation (which is more convenient in applications 
such as Corollary 1.2). 

Theorem 5.1. Let E \X\\ S < +oo (s > 2), assuming that s is integer in case d > 2. Then 
D{Z n ) = £ A-^ J (<p m (x) - <p{x)) k ^-y-, + o(A n ) (m = [s]). (5.3) 



Note that in case 2 < s < 4 there are no terms in the sum of (5.3) which then simplifies to 
D(Z n ) = o{A n ). 

Proof. In terms of L(u) = ulogu rewrite the integral in (5.1) as 

D n i= [ L \ ^ X } ) Vix) dx = [ L(l + um(x) +v n (x)) ip(x)dx, (5.4) 

J\x\<T n V J J\x\<T n 

where 

_ ip m {x) ~ <f{x) _ p n (x) - <p m {x) 

<p(x) (p[x) 

By Theorems 3.3-3.4, more precisely, by (3.6) for d = 1, and by (3.11) for d > 2 and s = m 
integer, in the region \x\ = 0{n 6 ) with an appropriate S > 0, we have 

\Pn{x) ~ <Pm(x)\ < n(s r " 2)/2 r ™^°- ( 5 - 5 ) 

Since ip(x) (1 + \x\ s ) is decreasing for large \x\, we obtain that, for all \x\ < T n , 

T 2 II 

\vJx)\ <C /" /0 - < C"r n e""/ 2 . 

1 nv n — n (s~2)/2 JJs — n 

The last expression tends to zero by a suitable choice of p n — > oo. This will further be assumed. 
In particular, for n large enough, |f Tt (x)| < \ in |x| < T n . 

From the definitions of and (p m , cf. (1.2), (3.1), and (3.10), it follows that 

1 _|_ M 3(m_2 ) 

\u m (x)\ < C m -= (5.6) 

\/n 



with some constants depending on m and the cumulants, only. So, we also have |u m (x)| < j 
for \x\ <T n with sufficiently large n. 

Now, by Taylor's formula, for |u| < |, |t>| < |, 

L(l + u + v) = L(l + u)+v + 9i uv + 6 2 v 2 



Entropic central limit theorem 17 

with some \6j\ < 1 depending on (u, v). Applying this representation with u = u m {x) and 
v = v n (x), we see that v n (x) can be removed from the right-hand side of (5.4) at the expense 
of an error not exceeding | J\ | + J 2 + J3, where 

Ji= (p n (x) - <p m (x))dx, J 2 = / \u m (x)\\p n (x) - (p m (x)\dx, 

J\x\<T n J\x\<T n 



and 



But 



= / (P»(X) - y m (x)) 2 ^ 



|x|<T„ 



/ (Pn(s)-^mW)^ < / p„(x)(ix+ / (p m (x)dx. (5.7) 

■/|x|>T„ ./|xl>T„ J\x\>T„ 



'\x\>T n J\x\>T n 

By Lemmas 4.1-4.2, the first integral on the right-hand side is T 2 -times smaller than o(A n ). 
Also, since f m (x) < ^f(x) for |x| < T n with sufficiently large n, the last integral in (5.7) is 
bounded by 2P{\Z\ > T n } = o(A n ), as well. As a result, J\ — o(A n ). 
Applying (5.6) and then the relation (3.12), we conclude as well that 



1 _|_ |TJ 3 ( m ~ 2 ) f „ 

■h<C m ^= / \p n (x) - Wm{x)\ dx = o(A n ). 

Vn J\x\<T n 



1 + | Tn |3(m-2) 

'|x|<T„ 

Finally, using (5.5) with s > 2, we get up to some constants 
r 2 



h < C 



n J\x\<T n 1 + M n s 1 J 1 

~ C ' d ^ T 2s " d+2 = °(n( s - 2 )/ 2 (logn)(^+ 2 )/ 2 ) = ° (An) ' 

If s = 2, all the steps are also valid and give 



for a suitably chosen T n — > +oo. 

Thus, at the expense of an error not exceeding o(A n ) one may remove v n (x) from (5.4), 
and then we obtain the relation 

D n ,i= L(l + u m (x))tp(x)dx + o(A n ), (5.8) 

J\x\<T n 

which contains specific functions, only. 

Moreover, u m (x) = u 2 (x) = for 2 < s < 3, and then the theorem is proved. 

Next, we consider the case s > 3. By Taylor's expansion around zero, whenever \u\ < \, 

L(l + u) = u + TTTr^n «* + 6 « m_1 > \0\ < 1, 

k=2 ~ L) 
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assuming that the sum has no terms in case m = 3. Hence, with some < < 1 



/ L(l + u m (x)) tp(x)dx = / (<p m (x) - <p{xj) dx (5.9) 

J\x\<T n J\x\<T n 

+ Elfc^TT / u m (x) k ip(x)dx + 9 I \u m (x)\ m - 1 i P (x)dx. (5.10) 
^ k{k - 1) J\ x \< Tn Jr.* 

For n large enough, the second integral in (5.9) has an absolute value 

/ (<p m (x) - <p(x)) dx < [ <p(x) dx = P{\Z\ > T n } = o(A n ). 

J\x\>T n J\x\>T n 

This proves the theorem in case 3 < s < 4 (when m = 3). 

Now, let s > 4. The last integral in (5.10) can be estimated by virtue of (5.6) by 

C 



n 



(m- 



'^J Rd {l + Ixl 3 ^" 1 )^- 2 )) <p(x) dx = o(A n ) 



In addition, the first integral in (5.10) can be extended to the whole space at the expense of 
an error not exceeding (for all n large enough) 

/ \u m (x)\ k <p(x)dx < / {l + \x\ 3k ^) V (x)dx 

J\x\>T n n k / 2 J\ x \ >Tn 

f-i/ rp3fe(m-2) 

y/n 

Moreover, if k > (s - 2)/2, 

J \u m (x)\ k <p(x)dx<-^ I {l + \ X f k ^)^x)dx = o(A n ). 
Collecting these estimates in (5.9)-(5.10) and applying them in (5.8), we arrive at 

T a ~ 2 1 
2 1 / 



_ 2 (—l) k f 

Dn > 1 = Yl uu u / u m(x) k dx + o(A n ). 

k=2 ^ ~ l > J 



It remains to apply (5.1). Thus, Theorem 5.1 is proved. 

6. Theorem 1.1 and its multidimensional extension 

The desired representation (1.3) of Theorem 1.1 can be deduced from Theorem 5.1. Note 
that the latter covers the multidimensional case as well, although under somewhat stronger 
moment assumptions. 

Thus, let (X n ) n >\ be independent identically distributed random vectors in R d with finite 
second moment. If the normalized sum Z n = (X\ + • • • + X n )j 'y/n has density p n (x), the 
entropic distance to Gaussianity is defined as in dimension one to be the relative entropy 



D(Z n ) = [ Pn (x) log dx 
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with respect to the normal law on R with the same mean a = EXi and covariance matrix 
£ = Var(Xi). This quantity is affine invariant, and in this sense it does not depend on (a, £). 

Theorem 6.1. If D(Z no ) < +00 for some no, then D(Z n ) — > 0, as n — >■ 00. Moreover, 
given that E \Xi\ s < +00 (s > 2), and that Xi has mean zero and identity covariance matrix, 
we have 

D ( z ») = | + | + '" + 3^f + •(*•> («■ = !•]). («) 

where A n are defined in (5.2), and where we assume that s is integer in case d > 2. 

As in Theorem 1.1, here each coefficient Cj is defined according to (1.4). It may be repre- 
sented as a certain polynomial in the cumulants 7„, 3 < \v\ < 2j + 1. 

Proof. We shall start from the representation (5.3) of Theorem 5.1, so let us return to the 
definition (3.1), 

m— 2 

(p m {x) - ip(x) = ^2 q r {x)n~ r/2 . 
r=l 

In the case 2 < s < 3 (that is, for m = 2), the right-hand side contains no terms and is 
therefore vanishing. Anyhow, raising this sum to the power k > 2 leads to 

((p m (x) - ^{x)) k = J^n~ j/2 ^2q ri (x) . . . q rk (x), 
j 

where the inner sum is carried out over all positive integers r*i , . . . , r*fc < m — 2 such that 
n -\ + rfc = j. Respectively, the A;-th integral in (5.3) is equal to 

E^' /2 E/^W---^W^zt- (6-2) 

Here the integrals are vanishing for odd j. In dimension one, this follows directly from the 
definition (1.2) of q r and the property of the Chebyshev-Hermite polynomials ([Sz]) 

/+00 
H ri (x) . . . H rk (x) ip(x) dx = (n + ■ ■ ■ + Tfc is odd). (6.3) 
-00 

As for the general case, let us look at the structure of the functions q r . Given a multi-index 
v = (1/1, ... ,u d ) with integers v 1 ,...,u d > 1, define H„(xi, ...,x d ) = H vi (x{) . . . H Vd (x d ), so 
that 

j e ^ h u (x) <p( x ) dx = {ity e -i*i 2 / 2 , t £ H d . 

Hence, by the definition (3.10), 

q r (x) = (p(x) E a u H v (x), (6.4) 
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where the coefficients a v emerge from the expansion P r (it) = ^2 u a u (it) u . Using (3.9), write 
these polynomials as 

- E ^ ( E ^)"...( E m 

1 r V \v\=3 7 V |H=r+2 7 

where the outer summation is performed over all non-negative integer solutions . . . ,l r ) to 
the equation l\ + 2/2 + • • • + rl r = r. Removing the brackets of the inner sums, we obtain a 
linear combination of the power polynomials {it) v with exponents of order 

\v\ = 3/1 + ••• + (r + 2)l r = r + 26 z , b t = h + ■ ■ ■ + l r . (6.6) 

In particular, r + 2 < \u\ < 3r, so that P r (it) is a polynomial of degree at most 3r, and thus 
( Pm{x) = N(x)ip(x), where N(x) is a polynomial of degree at most 3(m — 2). 
Moreover, from (6.4) and (6.6) it follows that 

qri{x ^ q -i {x) = ^ E • • • a ^ H ^ (*)■■■ H ^ ( 6 - 7 ) 

where \iy^\ + • • • + \ v^\ = r\ H + (mod 2). Hence, if ri + ■ ■ ■ + is odd, the sum 

k (i) i+---+k (fc) i=E(k; i) i+---+kf ) i) 
1=1 

is odd as well. But then at least one of the inner sums, say with coordinate i, must be odd as 
well. Hence in this case, the integral of (6.7) over xi will be vanishing by property (6.3). 

Thus, in the expression (6.2), only even values of j should be taken into account. 

Moreover, since the terms containing n~^ 2 with j > s — 2 will be absorbed by the remainder 
A n in the relation (6.1), we get from (5.3) and (6.2) 

| 3- 2 j 

D{Zn)= E^TT E n-^^/^ 1 (x)...^(x)-^ T + (A n ). 

fc=2 V ' evenj=2 J ^ V ' 

Replace now j with 2j and rearrange the summation: D{Z n ) = Ylij<m-2 + °(^n) with 

{ ' ^ ' - ^ ~ (x) 



(—1) r 

E^-ryEy^w---^^^:- 



Here the inner summation is carried out over all positive integers r± , . . . , < m — 2 such that 

2 

result, we arrive at the required relation (6.1) with 

C j = 

k=2 '" v " ~' ri +-+r k =2j 

Theorem 6.1 and therefore Theorem 1.1 are proved. 



n + ■ ■ ■ + rfc = 2j. This implies k < 2j. Also, 2j < m — 2 is equivalent to j < l^-]- As a 

required relation (6.1) with 

^W=T) E_ / ^■■■Qr k (x)^ z - 1 . (6.8) 



Remark. In order to show that Cj is a polynomial in the cumulants 7^, 3 < \u\ < 2j + 1, 
first note that r\ + • • • + = 2j, r±, . . . , > 1 imply 2j > maxj + (k — 1), so maxj < 2j — 1. 
Thus, the maximal index for the functions g ri in (6.8) does not exceed 2j— 1. On the other hand, 
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it follows from (6.4)-(6.5) that P r and q r are polynomials in the same set of the cumulants, 
more precisely, P r is a polynomials in 7„ with 3 < |z/| < r + 2. 

Proof of Corollary 1.2. By Theorem 5.1, cf. (5.3), 

^) = E^fy / M-)-^)) fc ^r+o(A n ). (6.9) 

Assume that m > 4 and 73 = ■ ■ ■ = 7&-i = for a given integer 3 < k < m. (There is no 
restriction, when k = 3.) Then, by (1.2), q\ = ■ ■ ■ = qk-3 = 0, while qk-2{x) = ^ Hk( x ) { p( x )- 
Hence, according to definition (2.1), 

1 m ~ 2 -(x) 

v m {x)-v(x) = ^H k {xMx) E 

j=k-l 

where the sum is empty in the case m = 3. Therefore, the sum in (1.3) will contain powers 
of 1/n starting from l/n fe ~ 2 , and the leading coefficient is due to the quadratic term in (6.9) 
when k = 2. More precisely, if k — 2 < ^^y^, we get that c\ = ■ ■ ■ = c&_3 = 0, and 

^2 r+00 2 
Cfc - 2 = YkP J ^ Hk(x)2 ^ {X) dX = 2kV (6 - 10) 



' —00 

.,2 



Hence, if jfe < f , (6.9) yields D(Z n ) = ^ ^2 + C^n - ** -1 )). Otherwise, the O-term should 
be replaced by o((n log n) _( - s_2 ^ 2 ). Thus Corollary 1.3 is proved. 

By a similar argument, the conclusion may be extended to the multidimensional case. 
Indeed, if j„ = 0, for all 3 < \u\ < k, then by (6.5), P\ = ■ ■ ■ = Pk-3 = 0, while 

Pk-2{lt) =^>^ 

Correspondingly, in (6.4) we have gi = ■ ■ ■ = g fc _ 3 = and q k - 2 {x) = f{x) Y^\u\=k TT H v (x). 
Therefore, 



^ m-2 

99 m (x) - = <p( X ) ^ H ^ X ) „(fc-2)/2 + E 

|z/|=fc ' j=fe-l 



Applying this relation in (6.9), we arrive at (6.1) with c\ = • • • = Ck-3 = and, by orthogonality 
of the polynomials H u , 

c ^ = \ / (E£jM*))%(*)<fc = s e|- 

We may summarize our findings as follows. 

Corollary 6.2. Lei (A n ) n >i be i.i.d. random vectors in H d (d > 2) with mean zero 
and identity covariance matrix. Suppose that ~E\Xi\ rn < +00 , for some integer m > 4, and 
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D(Z no ) < +oo ; for some uq. Given k = 3,4, . . . ,m, if j v = for all 3 < \v\ < k, we have 

D ( Z n) = S ^T +0 (^t) +0 (n(™- 2 )/ 2 (logn)(— rf V 2 )' (6 ' U) 

The conclusion corresponds to Corollary 1.2, if we replace d with 2 in the remainder on the 
right-hand side. 

As in dimension one, when EX^ < +oo, the o-term may be removed from this represen- 
tation, while for k > the o-term dominates. 

When k = 3, there is no restriction on the cumulants, and (6.11) becomes 



«4s! +o (?) + '( 

M=3 v 7 v 



1 



n (m-2)/2 (logn)( m - d )/ 2 



If E |Xi| 4 < +oo, we get D(Z n ) = 0(l/n) for d < 4, and only D(Z n ) = o((log ra)( d - 4 )/ 2 /n) 
for d > 5. However, if E|Xl| 5 < +oo, we always have D(Z n ) = 0(l/n) regardless of the 
dimension <i. 

Technically, this slight difference between conclusions for different dimensions is due to the 
dimension-dependent asymptotic Ji x i >t \x\ 2 ip(x) dx ~ CdT d e~ T ' ^ 2 '. 

7. Convolutions of mixtures of normal laws 

Is the asymptotic description of D(Z n ) in Theorem 1.1 still optimal, if no expansion terms 
of order n~ 3 are present? This is exactly the case for 2 < s < 4. 

In order to answer the question, we examine one special class of probability distributions 
that can be described as mixtures of normal laws on the real line with mean zero. They have 
densities of the form 

r+oo 

p(x)= / <p a {x)dP{o-) (x€R), (7.1) 
Jo 

where P is a (mixing) probability measure on the positive half-axis (0,+oo), and where 

^{x) = ^=e-* 2 l^ 
o-\J2it 



is the density of the normal law with mean zero and variance a 2 (As usual, we write (p(x) in 
the standard normal case with a = 1). 

Equivalently, let p(x) denote the density of the random variable X\ = pZ, where the factors 
Z ~ N(0, 1) and p > 0, having the distribution P, are independent. Such distributions appear 
naturally, for example, as limit laws in randomized models of summation (cf. e.g. [B-G]). 

For densities such as (7.1), we need a refinement of the local limit theorem for convolutions, 
described in the expansions (3.5)-(3.6). More precisely, our aim is to find a representation with 
an essentially smaller remainder term compared to o(n~( s ~ 2 )/ 2 ). 

Thus, let Xi, X2, ■ ■ ■ be independent random variables, having a common density p{x) as 
in (7.1), and let p n (x) denote the density of the normalized sum Z n = (X\ H h X n )/^/n. If 
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X\ = pZ , where Z ~ N(0, 1) and p > are independent, then EXf = E/5 2 and more generally 

r+oo 

E\X 1 \ a = p a EpT = p a a s dP(a), 

Jo 

where j3 s denotes the s-th absolute moment of Z. 

Note that p(x) is unimodal with mode at the origin, and p(0) = E ^J^- . If p > ao > 0, the 

density is bounded, and therefore the entropy h(X\) is finite. 

Proposition 7.1. Assume that Bp 2 = 1, Bp s < +oo (2 < s < 4). IfP{p > cr } = 1 with 
some constant > 0, then uniformly over all x, 

p n {x) = <p(x) + n £ (<p* n {x)-<p(x))dP(*) + o(-^y (7.2) 
where a n = + 

Of course, when Ep s < +oo for s > 4, the proposition may still be applied, but with s = 4. 
In this case (7.2) has a remainder term of order O(^j). 



J — - — a Z, where are 

independent copies of p (independent of Z as well). This represention already indicates the 
closeness of p n and cp and suggests to appeal to the law of large numbers. However, we shall 
choose a different approach based on the study of the characteristic functions of Z n . 
Obviously, the characteristic function of X\ is given by 

v(t) = E e itXl = E e V' 2 /2 (t £ R). 

Using Jensen's inequality and the assumption p > uq > 0, we get a two-sided estimate 

e"* 2/2 < v(t) < e-^l 2 . (7.3) 

In particular, the function tp(t) = e* 2//2 i>(i) — 1 is non-negative for all t real. 

Lemma 7.2. IfEp 2 = I, M s = Ep s < +oo (2 < s < 4), then for all \t\ < I, 

< ip{t) < M s \t\ s . 



Proof. We may assume < t < 1. Write = E(e-^ 2 ~ 1 )' 2 / 2 - l). The expression 
under the expectation sign is non-positive for pt > 1, so 

^(t)<E(e-^- 1 )* 2 / 2 -l)l {p < lA} . 

Let x = -(p 2 -l)t 2 . Clearly, \x\ < 1 for p < l/t. Using e x < 1 + x + x 2 (\x\ < 1) and Ep 2 = 1, 
we get 

m < -^E(p 2 -i)i {p < 1/t} + ^E( P 2 -i) 2 i {p < 1A} 

= |E(p 2 -l)l {p>lA} + ^E(p 2 -l) 2 l {p < 1/t} . (7.4) 
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The last expectation is equal to 

E p 4 l {p < 1/t} - 2 E (1 - p 2 ) l {p>1/t} + P{p < 1/t} < E p 4 l {p < lA} + 2 E p 2 l {p>1/t} - 1 

< E P 4 1 {p<i/t} + E P 2 l{p>i/t}- 

Together with (7.4), this gives 

V 2 i 4 
^(t) < — Ep 2 l {p>1/t} +-E/l {p < 1/i} . (7.5) 



Finally, Ep 2 l {p>1/t} < Ep* t s ~ 2 l {p>1/t} < M s t s - 2 and Ep 4 l {p < lA} < Bp s t s ~ 4 l {p < 1/t} < 
M s t s ~ 4 . It remains to use these estimates in (7.5) and Lemma 7.2 is proved. 

Proof of Proposition 7.1. The characteristic functions v n (t) = v(-^) n of Z n are real- 
valued and admit, by (7.3), similar bounds 

e- t2/2 < v n (t) < e^ 2 ' 2 . (7.6) 
In particular, one may apply the inverse Fourier transform to represent the density of Z n as 

Pn(x) = —J e- Ux v n (t) dt=—J e- ltx - 1 I 2 (l + rl>(t/y/n)) n dt. 

Letting T n = — logn, we split the integral into the two regions, defined by 

h= f e- itx v n (t)dt, I 2 = f e- itx v n (t)dt. 

J\t\<T n J\t\>T n 

By the upper bound in (7.6), 

\h\ < I e-°* t2 l 2 dt<— e- CT ° T «/ 2 = (7.7) 
J\t\>T n °o o- n* 

In the interval \t\ < T n , by Lemma 7.2, V(^) < < for a11 n - n °- But for < e < 

there is a simple estimate < (1 + e) n — 1 — ne < 2 (ne) 2 . Hence, once more by Lemma 7.2, 

_ 9 lil 2s 

< (1 + V^/v 7 ™))™ - 1 - nij){t/y/n) < 2 (n^(t/v^)) < 2M 2 (" > no)- 

This gives 

/!-/ e -^ 2 /yi + m p(t/V^))dt \t\ 2s e~ t2 l 2 dt. (7.8) 

■/|t|<T n ™ S J-00 

In addition, 

f e -^~* 2 /2 + n ^t/y/n)) dt < f e~ t2 l 2 dt + n [ e~* ' /2 ^{t/ 'y/n) dt. 

J\t\>T n J\t\>T n J\t\>T n 

Here, the first integral on the right-hand side is of order 0(n" 8 ). To estimate the second one, 
recall that, by (7.3), i/j{t) = e t2 / 2 v{t) - 1 < e (i-^)* 2 /2. Hence, ij)(t/y/n) < e^-^l 2 and 

/ e -^(t/y/n) dt< I e-°l {2 l 2 dt < 
J\t\>T n J\t\>T n ff on 8 
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Together with (7.7) and (7.8) these bounds imply that 

Pn(x) = ^ | + °° e-^- t2 ' 2 {l + n^t/y/n)) dt + 0(^2) 
uniformly over all It remains to note that 

— / e~ ite -* l 2 rl>(t/y/n)dt = — e- ltx - 1 / 2 (e* / 2 %(t/^) - l) ^ 

/•+oo 

= / {<p an {x)-<p{x))dP{a). 
Jo 

Proposition 7.1 is proved. 

Remark 7.3. An inspection of (7.5) shows that, in the case 2 < s < 4, Lemma 7.2 may 
slightly be sharpened to ip(t) = o(\t\ s ). Correspondingly, the O-relation in Proposition 7.1 can 
be replaced with an o-relation. This improvement is convenient, but not crucial for the proof 
of Theorem 1.3. 

8. Lower bounds. Proof of Theorem 1.3 

Let X±, X2, ■ ■ . be independent random variables with a common density of the form 

p+00 



r+00 

p(x) = / (p a (x) dP(a), x 6 R. 
Jo 



Equivalently, let X\ = pZ with independent random variables Z ~ N(0,1) and p > having 
distribution P. 

A basic tool for proving Theorem 1.3 will be the following lower bound on the entropic 
distance to Gaussianity for the partial sums S n — X\ + • ■ • + X n . 

Proposition 8.1. Let Bp 2 = 1, Ep s < +00 (2 < s < 4), and P{p > a } = 1 with a > 0. 
Assume that, for some 7 > 0, 

1 f +OQ 1 

liminfn s -2 / - dP(a) > 0. (8.1) 



n— >oo 



(8.2) 



Then with some absolute constant c > and some constant 5 > 

D(S n ) > cnlogra P{p > n\ogn) + 0^ Ja^ )- 

In fact, in (8.2) one may take any positive number 5 < min{7s, ^y 2 -}. 
Proof. By Proposition 7.1 and Remark 7.3, uniformly over all x, 

p n (x) = ip(x) + n jf {<Pa n (x)-ip{x))dP{a)+o(^ : ^j, (8.3) 
where p n is the density of S n / ^fn and a n = \J^ + 
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Define the sequence 



N„ 



n2+ 7 



5Vlog n 

for n large enough (so that N n > 1). By Chebyshev's inequality, 

P{p>N n } <5 s M s ^^ = o(-^), 0<5< 7 s. (8.4) 
Using ulogu > u — 1 (u > 0) and applying (8.3), we may write 

I n = / ^ _ p n (x)log^-— dx > / (p n (x) - tp(x)) dx 

J\x\<4-JlS£h' VA X J J|a;|<4VIolH 

> n / _ (^ (x) - ^(x)) dx dP(a) - (8.5) 



' |a;|<4 v / log?i 

with some constant C. 

Note that a n < 1 for a < 1, and then, for any T > 0, 

(c^Jx) - <^(x)) dx = 2 ($(TK) - $(T)) > 0, 

where $ denotes the distribution function of the standard normal law. Hence, the outer 
integral in (8.5) may be restricted to the values a > 1. Moreover, by (8.4), one may also 
restrict this integral to the values a > N n . More precisely, (8.4) gives 



n 



l + I __ (<P*M ~ <P{x)) dxdP(a) < nP{p > N n } = o(-±-) 



Comparing this relation with (8.5) and imposing the additional requirement 5 < § -^ L , we get 

In > n ! [ ((p an (x) -<p(x)) dx dP(a) + o( 8 \ \ 

Ji ./|x|<VI5gra \n~ +0 J 

= -2n / <p(x)dxdP(a) + o ■ (8-6) 



Now, let us estimate from below p n (x) in the region 4v / logn < |x| < n 1 . If |x| > 4yiogn, 
it follows from (8.3) that 

p n (x) = n[ <p an (x)dP(a) + o(-^). (8.7) 



o \n s 2 



Consider the function 

r>+00 



<?n(*) = / °°^-dP(a). 
Jo <p{x) 

Note that 1 < a n < a for a > 1. In this case, the ratio ^^J^ 1S non " increasing in x > 0. 
Moreover, for cr > y^n + 1, we have cr 2 = l+ 2 ^- > 4, so 1 — > f ■ Hence, for |x| > 4-y/logn, 



^ 1 ^ ™ 

7 — r~ = 6 CT n > 

(/?(x) a„ cr 
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Therefore, 

r+oo ^ 
9n(x)>n 6 _-dP(a). 

J4 v / Iogn a 

But by the assumption (8.1), the last expression tends to infinity with n, so for all n large 
enough, g n (x) > 2 in the interval \x\ > 4-y/logro. 

Next, if a > \x\y/n, then a 2 = 1 + ?-=^ > x 2 , so ^ < \. On the other hand, 

2 a 2 n + a 2 g + a 2 2a 2 

n n n n 

since |x| > 41ogn > 1 for n > 2. The two estimates give 

Therefore, whenever 4y / Iogn < \x\ < n 7 , 

/•+00 3/2 r+oo i 3/2 /-+oo i 

n / ^„(x)dP(a) > — / -dP(a) > — / -cLP(a). 

By the assumption (8.1), the last expression and therefore the left integral are larger than 
with some constant c > 0. Consequently, the remainder term in (8.7) is indeed smaller, 
so that for all n large enough, we may write, for example, 

Pn(x) > 0.8 n j ip an (x) dP{a) = 0.8 n g n (x)ip(x) ^4\/logn < \x\ < n 7 ^ . 

Since g n (x) > 2 for \x\ > 4 v / log n with large n, we have in this region p ™^) > l-6n > n, so 

p n (x)\og n > p n (x)\ogn > 0.8 n log n / </? CTn (a;) dx dP(a). 
fix) Jo 

Hence, 

f Pn(x) [ + °° f 

/ p n (x) log n dx > 0.8 n log n / / (p an (x) dx dP(a) 

J 4v/I5Kn<|x|<nT fix) Jo J 4jl6TK<\x\<nl 



U^/\ogn<\x\<n^ +°{X) JO J 4^/logn<\x\<ni 

f + °° fi.. 

= 1.6 n log n / / ip(x) dx dP(a). 

Jo J : 



n< 



At this point, it is useful to note that ^- > 4 v / log n, as long as a < N n with n large enough. 
Indeed, in this case a 2 < (1 - ±) + ^ < 1 + ^77;, so 

(Aa n yJ\ognf < 161ogn^l + 
for all n large enough. Hence, from (8.8), 



25 1ogn ' <n ' 



/ p ra (x) log n > 1.6nlogn / / ip(x) dx dP(a). 

J 4y/lSgri<\x\<n-y Vix) Jo J^V^n 
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But the last expression dominates the double integral in (8.6) with a factor of 2n. Therefore, 
combining the above estimate with (8.6), we get 

[ Vn(x) f Nn [ A ^ logn ( 1 

/ p ra (x)log — dx > 1.4 n log n / / ip(x) dx dP(a) + o 3 _ 2 

J\x\<m ¥W Jo JJ-^a \n— +s 

Finally, we may extend the outer integral on the right-hand side to all values a > by 
noting that, by (8.4), 

r+oc t'4^/logn 

nlogn / / ip(x) dx dP(a) < nlogn P{p > N n } = o( 

Jn„ Jj-yOSi^ 



n 2 +0 



Hence, 

f V (x) f + °° fVlogn / l \ 

/ p n (x) log dx > 1.4 nlogn / / y>(g) rfP(a) + o 3 _ 2 J . (8.9) 

J\x\<nt <P{ X ) JO JJ-yOSi^ \n — +5 J 

For the remaining values |x| > n 7 , one can just use the property nlogn > — | to get a 
simple lower bound 

f Pti \P^) f Pn ( ) 

/ p n (x) log dx > / p n (x)log^— - dx 

J|x|>n7 ¥>W 7|x|>nT,p„(x)<¥)(x) 

> -- f ip(x)dx > - e -™ 27 / 2 . 

Together with (8.9) this yields 

/+oo / \ p+oo r-^logn / i \ 

p n (x) log ^ dx > 1.4 nlogn / / ^(x) dx dP{a) + o ■ 

fix) Jo J^VWVi \n— +5 J 

To simplify, finally note that — \/\ogn < 4 for a > V n logn. In this case the last integral 
is separated from zero (for large n), hence with some absolute constant c > 

p n {x)\og dx > cnlogn P{p > ^Jn logn} + o s _ 2 ■ - J. 

\ n — +5 J 

This is exactly the required inequality (8.2) and Proposition 8.1 is proved. 

Proof of Theorem 1.3. Given rf > 0, one may apply Proposition 8.1 to the probability 
measure P with density 

dp (?) = c r, a>2 
da o- s+1 (logcj)'?' 

and extending it to an interval [co,2] to meet the requirement J^^°a 2 dP(a) = 1 (with some 
< do < 1 and a normalizing constant c v > 0). It is easy to see that in this case the condition 

s-2 
2s 

1 



(8.1) is fulfilled for < 7 < In addition, if p has the distribution P, we have 



P{p > cr} > const 



cr^logcr)^' 

for all cr large enough. Hence, by taking a = \fn logn, (8.2) provides the desired lower bound. 
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Remark. In case s = 2 (that is, with minimal moment assumptions), the mixtures of 
the normal laws with discrete mixing measures P were used by Matskyavichyus [M] in the 
central limit theorem in terms of the Kolmogorov distance. Namely, it is shown that, for any 
prescribed sequence e n — > 0, one may choose P such, that A n = supj. \ F n (x) — $(x)\ > e n for 
all n large enough (where F n is the distribution function of Z n ). In view of the Pinsker-type 
inequality, one may conclude that 

D(Z n )>±&*>±£. 
Therefore, D{Z n ) may decay arbitrarily slow. 
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